{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Auto MPG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A function that loads the `autompg` dataset into NumPy arrays."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> from mlxtend.data import autompg_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Auto-MPG dataset for regression analysis. The target (`y`) is defined as the miles per gallon (mpg) for 392 automobiles (6 rows containing \"NaN\"s have been removed. The 8 feature columns are:\n",
    "\n",
    "**Features**\n",
    "\n",
    "1. cylinders: multi-valued discrete \n",
    "2. displacement: continuous \n",
    "3. horsepower: continuous \n",
    "4. weight: continuous \n",
    "5. acceleration: continuous \n",
    "6. model year: multi-valued discrete \n",
    "7. origin: multi-valued discrete \n",
    "8. car name: string (unique for each instance)\n",
    "\n",
    "- Number of samples: 392\n",
    "\n",
    "- Target variable (continuous): mpg\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### References\n",
    "\n",
    "- Source: [https://archive.ics.uci.edu/ml/datasets/Auto+MPG](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)\n",
    "- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example - Dataset overview"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dimensions: 392 x 8\n",
      "\n",
      "Header: ['cylinders', 'displacement', 'horsepower weight', 'acceleration', 'model year', 'origin', 'car name']\n",
      "1st row [  8.00000000e+00   3.07000000e+02   1.30000000e+02   3.50400000e+03\n",
      "   1.20000000e+01   7.00000000e+01   1.00000000e+00              nan]\n"
     ]
    }
   ],
   "source": [
    "from mlxtend.data import autompg_data\n",
    "X, y = autompg_data()\n",
    "\n",
    "print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))\n",
    "print('\\nHeader: %s' % ['cylinders', 'displacement', \n",
    "                           'horsepower weight', 'acceleration',\n",
    "                           'model year', 'origin', 'car name'])\n",
    "print('1st row', X[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the feature array contains a `str` column (\"car name\"), thus it is recommended to pick the features as needed and convert it into a `float` array for further analysis. The example below shows how to get rid of the `car name` column and cast the NumPy array as a `float` array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[   8. ,  307. ,  130. , ...,   12. ,   70. ,    1. ],\n",
       "       [   8. ,  350. ,  165. , ...,   11.5,   70. ,    1. ],\n",
       "       [   8. ,  318. ,  150. , ...,   11. ,   70. ,    1. ],\n",
       "       ..., \n",
       "       [   4. ,  135. ,   84. , ...,   11.6,   82. ,    1. ],\n",
       "       [   4. ,  120. ,   79. , ...,   18.6,   82. ,    1. ],\n",
       "       [   4. ,  119. ,   82. , ...,   19.4,   82. ,    1. ]])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X[:, :-1].astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "## autompg_data\n",
      "\n",
      "*autompg_data()*\n",
      "\n",
      "Auto MPG dataset.\n",
      "\n",
      "\n",
      "\n",
      "- `Source` : https://archive.ics.uci.edu/ml/datasets/Auto+MPG\n",
      "\n",
      "\n",
      "- `Number of samples` : 392\n",
      "\n",
      "\n",
      "- `Continuous target variable` : mpg\n",
      "\n",
      "\n",
      "    Dataset Attributes:\n",
      "\n",
      "    - 1) cylinders:  multi-valued discrete\n",
      "    - 2) displacement: continuous\n",
      "    - 3) horsepower: continuous\n",
      "    - 4) weight: continuous\n",
      "    - 5) acceleration: continuous\n",
      "    - 6) model year: multi-valued discrete\n",
      "    - 7) origin: multi-valued discrete\n",
      "    - 8) car name: string (unique for each instance)\n",
      "\n",
      "**Returns**\n",
      "\n",
      "- `X, y` : [n_samples, n_features], [n_targets]\n",
      "\n",
      "    X is the feature matrix with 392 auto samples as rows\n",
      "    and 8 feature columns (6 rows with NaNs removed).\n",
      "    y is a 1-dimensional array of the target MPG values.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open('../../api_modules/mlxtend.data/autompg_data.md', 'r') as f:\n",
    "    print(f.read())"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
